Hannah Meyer
September 2025
For a new project for your own data, independent of this tutorial follow the same scheme:
To generate a new file for you analysis:
choose File > New File
select:
Have a look at this blog post, with more advice on the structure and sub-folders in your project directory.
Rstudio provides a number of cheat sheets for the RStudio interface setup, R Markdown etc. To acces them, choose Help > Cheatsheets:
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr 1.1.4 ✔ readr 2.1.5
## ✔ forcats 1.0.0 ✔ stringr 1.5.1
## ✔ ggplot2 3.5.2 ✔ tibble 3.3.0
## ✔ lubridate 1.9.4 ✔ tidyr 1.3.1
## ✔ purrr 1.0.4
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
## Linking to GEOS 3.13.0, GDAL 3.8.5, PROJ 9.5.1; sf_use_s2() is TRUE
function_output |
what the function returns |
<- |
the assignment operator |
function |
name of the function |
argument1 |
first argument the function accepts |
something |
our specification to argument1 |
argument2 |
second argument the function accepts |
something_else |
our specification to argument2 |
function_output |
what the function returns |
<- |
the assignment operator |
function |
name of the function |
argument1 |
first argument the function accepts |
something |
our specification to argument1 |
argument2 |
second argument the function accepts |
something_else |
our specification to argument2 |
function_output |
coord |
<- |
the assignment operator |
function |
read_csv |
argument1 |
file |
something |
“data/2004_Science_Smith_data.csv” |
Data from Smith et al (2004): http://www.antigenic-cartography.org/)
## Rows: 322 Columns: 9
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (4): name, cluster, type, location
## dbl (5): year, x.coordinate, y.coordinate, lat, lng
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
? to have a look at the
documentation of read_csv by typing
?read_csv.coord object by creating a new
chunk, typing coord and executing the code chunk.read_csv represented
in coord?? to have a look at the
documentation of read_csv.read_delim {readr} R Documentation
Read a delimited file (including csv & tsv) into a tibble
Description
read_csv() and read_tsv() are special cases of the general read_delim().
They're useful for reading the most common types of flat file data, comma
separated values and tab separated values, respectively [...]
coord object by creating a new
chunk, typing coord and executing the code chunk.## # A tibble: 322 × 9
## name year cluster type x.coordinate y.coordinate location lat lng
## <chr> <dbl> <chr> <chr> <dbl> <dbl> <chr> <dbl> <dbl>
## 1 BI/15793/… 1968 HK68 AG 4.05 15.0 BILTHOV… 52.1 5.02
## 2 BI/16190/… 1968 HK68 AG 4.10 14.8 BILTHOV… 52.1 5.02
## 3 BI/16398/… 1968 HK68 AG 4.36 13.9 BILTHOV… 52.1 5.02
## 4 BI/808/69 1969 HK68 AG 3.87 14.3 BILTHOV… 52.1 5.02
## 5 BI/908/69 1969 HK68 AG 4.87 14.1 BILTHOV… 52.1 5.02
## 6 BI/17938/… 1969 HK68 AG 4.40 14.9 BILTHOV… 52.1 5.02
## 7 BI/93/70 1970 HK68 AG 5.06 14.5 BILTHOV… 52.1 5.02
## 8 BI/2668/70 1970 HK68 AG 4.82 15.5 BILTHOV… 52.1 5.02
## 9 BI/6449/71 1971 HK68 AG 3.87 15.9 BILTHOV… 52.1 5.02
## 10 BI/21438/… 1971 HK68 AG 4.27 14.1 BILTHOV… 52.1 5.02
## # ℹ 312 more rows
read_csv represented
in coord?Compare the message printed by read_csv:
Parsed with column specification:
cols(
name = col_character(),
year = col_double(),
cluster = col_character(),
type = col_character(),
x.coordinate = col_double(),
y.coordinate = col_double(),
location = col_character(),
lat = col_double(),
lng = col_double()
)
to the column specification in coord
## # A tibble: 6 × 9
## name year cluster type x.coordinate y.coordinate location lat lng
## <chr> <dbl> <chr> <chr> <dbl> <dbl> <chr> <dbl> <dbl>
## 1 BI/15793/68 1968 HK68 AG 4.05 15.0 BILTHOV… 52.1 5.02
## 2 BI/16190/68 1968 HK68 AG 4.10 14.8 BILTHOV… 52.1 5.02
## 3 BI/16398/68 1968 HK68 AG 4.36 13.9 BILTHOV… 52.1 5.02
## 4 BI/808/69 1969 HK68 AG 3.87 14.3 BILTHOV… 52.1 5.02
## 5 BI/908/69 1969 HK68 AG 4.87 14.1 BILTHOV… 52.1 5.02
## 6 BI/17938/69 1969 HK68 AG 4.40 14.9 BILTHOV… 52.1 5.02
The most common data types in R (base R and tidyverse) are:
int |
integers | 1, 2, 3 |
dbl |
doubles | 1.2, 1.7, 9.0 |
chr |
character | “a”, “b”, “word” |
lgl |
logical | TRUE or FALSE |
fctr |
factors | categorical variables with fixed values |
data.frame## name year cluster type x.coordinate y.coordinate location lat lng
## 1 BI/15793/68 1968 HK68 AG 4.048064 14.97272 BILTHOVEN 52.14 5.02
## 2 BI/16190/68 1968 HK68 AG 4.103302 14.80633 BILTHOVEN 52.14 5.02
## 3 BI/16398/68 1968 HK68 AG 4.363448 13.89293 BILTHOVEN 52.14 5.02
## 4 BI/808/69 1969 HK68 AG 3.871698 14.25529 BILTHOVEN 52.14 5.02
## 5 BI/908/69 1969 HK68 AG 4.868656 14.09319 BILTHOVEN 52.14 5.02
## 6 BI/17938/69 1969 HK68 AG 4.400375 14.86012 BILTHOVEN 52.14 5.02
tibble## # A tibble: 6 × 9
## name year cluster type x.coordinate y.coordinate location lat lng
## <chr> <dbl> <chr> <chr> <dbl> <dbl> <chr> <dbl> <dbl>
## 1 BI/15793/68 1968 HK68 AG 4.05 15.0 BILTHOV… 52.1 5.02
## 2 BI/16190/68 1968 HK68 AG 4.10 14.8 BILTHOV… 52.1 5.02
## 3 BI/16398/68 1968 HK68 AG 4.36 13.9 BILTHOV… 52.1 5.02
## 4 BI/808/69 1969 HK68 AG 3.87 14.3 BILTHOV… 52.1 5.02
## 5 BI/908/69 1969 HK68 AG 4.87 14.1 BILTHOV… 52.1 5.02
## 6 BI/17938/69 1969 HK68 AG 4.40 14.9 BILTHOV… 52.1 5.02
c()## [1] 1.6 2.5 3.2
Tibbles and data.framesThe tidyverse uses data in the long format:
Variables
Observations
## # A tibble: 6 × 9
## name year cluster type x.coordinate y.coordinate location lat lng
## <chr> <dbl> <chr> <chr> <dbl> <dbl> <chr> <dbl> <dbl>
## 1 BI/15793/68 1968 HK68 AG 4.05 15.0 BILTHOV… 52.1 5.02
## 2 BI/16190/68 1968 HK68 AG 4.10 14.8 BILTHOV… 52.1 5.02
## 3 BI/16398/68 1968 HK68 AG 4.36 13.9 BILTHOV… 52.1 5.02
## 4 BI/808/69 1969 HK68 AG 3.87 14.3 BILTHOV… 52.1 5.02
## 5 BI/908/69 1969 HK68 AG 4.87 14.1 BILTHOV… 52.1 5.02
## 6 BI/17938/69 1969 HK68 AG 4.40 14.9 BILTHOV… 52.1 5.02
Use the output of running coord to determine:
## # A tibble: 322 × 9
## name year cluster type x.coordinate y.coordinate location lat lng
## <chr> <dbl> <chr> <chr> <dbl> <dbl> <chr> <dbl> <dbl>
## 1 BI/15793/… 1968 HK68 AG 4.05 15.0 BILTHOV… 52.1 5.02
## 2 BI/16190/… 1968 HK68 AG 4.10 14.8 BILTHOV… 52.1 5.02
## 3 BI/16398/… 1968 HK68 AG 4.36 13.9 BILTHOV… 52.1 5.02
## 4 BI/808/69 1969 HK68 AG 3.87 14.3 BILTHOV… 52.1 5.02
## 5 BI/908/69 1969 HK68 AG 4.87 14.1 BILTHOV… 52.1 5.02
## 6 BI/17938/… 1969 HK68 AG 4.40 14.9 BILTHOV… 52.1 5.02
## 7 BI/93/70 1970 HK68 AG 5.06 14.5 BILTHOV… 52.1 5.02
## 8 BI/2668/70 1970 HK68 AG 4.82 15.5 BILTHOV… 52.1 5.02
## 9 BI/6449/71 1971 HK68 AG 3.87 15.9 BILTHOV… 52.1 5.02
## 10 BI/21438/… 1971 HK68 AG 4.27 14.1 BILTHOV… 52.1 5.02
## # ℹ 312 more rows
## # A tibble: 322 × 9
## name year cluster type x.coordinate y.coordinate location lat lng
## <chr> <dbl> <chr> <chr> <dbl> <dbl> <chr> <dbl> <dbl>
## 1 BI/15793/… 1968 HK68 AG 4.05 15.0 BILTHOV… 52.1 5.02
## 2 BI/16190/… 1968 HK68 AG 4.10 14.8 BILTHOV… 52.1 5.02
## 3 BI/16398/… 1968 HK68 AG 4.36 13.9 BILTHOV… 52.1 5.02
## 4 BI/808/69 1969 HK68 AG 3.87 14.3 BILTHOV… 52.1 5.02
## 5 BI/908/69 1969 HK68 AG 4.87 14.1 BILTHOV… 52.1 5.02
## 6 BI/17938/… 1969 HK68 AG 4.40 14.9 BILTHOV… 52.1 5.02
## 7 BI/93/70 1970 HK68 AG 5.06 14.5 BILTHOV… 52.1 5.02
## 8 BI/2668/70 1970 HK68 AG 4.82 15.5 BILTHOV… 52.1 5.02
## 9 BI/6449/71 1971 HK68 AG 3.87 15.9 BILTHOV… 52.1 5.02
## 10 BI/21438/… 1971 HK68 AG 4.27 14.1 BILTHOV… 52.1 5.02
## # ℹ 312 more rows
coord contains the processed and formated publicly
available data from Smith et al.
(2004)name |
name of virus isolate | |
year |
year of isolation | |
cluster |
derived cluster | |
type |
serum or antigen measurement | |
x.coordinate |
x coordinate in antigenic space | |
y.coordinate |
y coordinate in antigenic space | |
location |
location of virus measurement | |
lat |
latitude of location | |
lng |
longitude of location |
ggplot(data = coord). What do you see?coord could
we use?coord could
we use?
p +
geom_point(aes(x=x.coordinate, y=y.coordinate, color=cluster)) +
scale_color_brewer(type="qual", palette = "Set3")p +
geom_point(aes(x=x.coordinate, y=y.coordinate, color=cluster),
shape=17) +
scale_color_brewer(type="qual", palette = "Set3") size and
shape. Does this convey the same level of information as a
color scale?shape scale? Add a shape aesthetic for the variable you
identified.size and
shape. Does this convey the same level of information as a
color scale?## Warning: Using size for a discrete variable is not advised.
size and
shape. Does this convey the same level of information as a
color scale?## Warning: The shape palette can deal with a maximum of 6 discrete values because more
## than 6 becomes difficult to discriminate
## ℹ you have requested 11 values. Consider specifying shapes manually if you need
## that many of them.
## Warning: Removed 111 rows containing missing values or values outside the scale range
## (`geom_point()`).
shape scale? Add a shape aesthetic for the variable you
identified.p +
geom_point(aes(x=x.coordinate, y=y.coordinate, color=cluster)) +
scale_color_brewer(type="qual", palette = "Set1")p +
geom_point(aes(x=x.coordinate, y=y.coordinate, color=cluster)) +
scale_color_brewer(type="qual", palette = "Set1")-> Not enough classes in palette
-> color is inside the aesthetics mapping; if manually setting colors, move outside of aes
p +
geom_point(aes(x=x.coordinate, y=y.coordinate, color=cluster)) +
scale_color_brewer(type="qual", palette = "Set3") +
coord_fixed()p +
geom_point(aes(x=x.coordinate, y=y.coordinate, color=cluster)) +
scale_color_brewer(type="qual", palette = "Set3") +
labs(x="Dimension 1 [AU]",
y="Dimension 2 [AU]",
title="Antigenic cartography",
color="Cluster") +
coord_fixed()p +
geom_point(aes(x=x.coordinate, y=y.coordinate, color=cluster)) +
scale_color_brewer(type="qual", palette = "Set3") +
labs(x="Dimension 1 [AU]",
y="Dimension 2 [AU]",
title="Antigenic cartography",
color="Cluster") +
coord_fixed() +
theme_bw()?coord_ in a new chunk and press tab to see other
options.shape
aesthetic. Rename the legend title for this aesthetictheme_void(), theme_dark() and
theme_classic(). Similar to Exercise 1, you can type
?theme_ and tab to see other possible build in themes.?coord_ in a new chunk and press tab to see other
options.shape
aesthetic. Add this aesthetic here and rename its legend title.p +
geom_point(aes(x=x.coordinate, y=y.coordinate, shape=type, color=cluster)) +
scale_color_brewer(type="qual", palette = "Set3") +
labs(x="Dimension 1 [AU]",
y="Dimension 2 [AU]",
title="Antigenic cartography",
color="Cluster",
shape="Measurement") +
coord_fixed() +
theme_bw()theme_void(), theme_dark() and
theme_classic(). Similar to Exercise 1, you can type
?theme_ and tab to see other possible build in themes.p +
geom_point(aes(x=x.coordinate, y=y.coordinate, shape=type, color=cluster)) +
scale_color_brewer(type="qual", palette = "Set3") +
labs(x="Dimension 1 [AU]",
y="Dimension 2 [AU]",
title="Antigenic cartography",
color="Cluster",
shape="Measurement") +
coord_fixed() +
theme_void()p +
geom_point(aes(x=x.coordinate, y=y.coordinate, shape=type, color=cluster)) +
scale_color_brewer(type="qual", palette = "Set3") +
labs(x="Dimension 1 [AU]",
y="Dimension 2 [AU]",
title="Antigenic cartography",
color="Cluster",
shape="Measurement") +
coord_fixed() +
theme_dark()p +
geom_point(aes(x=x.coordinate, y=y.coordinate, shape=type, color=cluster)) +
scale_color_brewer(type="qual", palette = "Set3") +
labs(x="Dimension 1 [AU]",
y="Dimension 2 [AU]",
title="Antigenic cartography",
color="Cluster",
shape="Measurement") +
coord_fixed() +
theme_classic()p +
geom_point(aes(x=x.coordinate, y=y.coordinate, color=cluster)) +
scale_color_brewer(type="qual", palette = "Set3") +
labs(x="Dimension 1 [AU]",
y="Dimension 2 [AU]",
title="Antigenic cartography",
color="Cluster") +
coord_fixed()
+ theme_light()As indicated in error message:
Error: Cannot use `+.gg()` with a single argument. Did you accidentally put
+ on a new line?
## Saving 8 x 6 in image
Note: ggsave overwrites the previous
file of that name without warning!
-> labels and legend stay legible, make sure to always choose the right text sizes and image sizes in combination
p + geom_bar(aes(x=year, fill=cluster),
position=position_dodge(preserve="single")) +
scale_fill_brewer(type="qual", palette = "Set3") +
theme_bw()p + geom_histogram(aes(x=year,
fill=cluster),
position=position_dodge(preserve="single"),
binwidth = 10) +
scale_fill_brewer(type="qual", palette = "Set3") +
theme_bw()p + geom_boxplot(aes(x=type, y=year, color=type)) +
geom_jitter(aes(x=type, y=year, color=type)) +
theme_bw()world <- ne_countries(scale = "medium", returnclass = "sf")
g <- ggplot()
g + geom_sf(data = world) +
geom_point(data=coord, aes(x=lng, y=lat, color=cluster)) +
scale_color_brewer(type="qual", palette = "Set3")position argument of
geom_bar. Hint: use the Details paragraph in
?geom_bar to find a description about possible
options.preserve="total" in
position_dodge of geom_histogram?aes(fill) instead of aes(color)?geom_jitter to geom_point to see
why geom_jitter is a better visualisation of the data. Go
back to using geom_jitter and play with the
width argument to customise your plot.theme for the world map? Add it to
the plot.Test different options for the position argument of
geom_bar. Hint: use the Details paragraph in
?geom_bar to find a description about possible options.
Details
By default, multiple bars occupying the same x position will be stacked atop one another by position_stack(). If you want them to be dodged side-to-side, use position_dodge() or position_dodge2(). Finally, position_fill() shows relative proportions at each x by stacking the bars and then standardising each bar to have the same height.
p + geom_bar(aes(x=year, fill=cluster),
position=position_stack()) +
scale_fill_brewer(type="qual", palette = "Set3") +
theme_bw()preserve="total" in
position_dodge of geom_histogram?Hint: In the help function for
geom_histogram, click on position_dodge to get
to the help for this function. From there, you can see:
preserve Should dodging preserve the total width of all elements at a
position, or the width of a single element?
p + geom_histogram(aes(x=year, fill=cluster),
position=position_dodge(preserve="total"),
binwidth = 10) +
scale_fill_brewer(type="qual", palette = "Set3") +
theme_bw()
# Exercises
aes(fill) instead of aes(color)?p + geom_boxplot(aes(x=type, y=year, fill=type)) +
geom_jitter(aes(x=type, y=year, color=type)) +
scale_fill_manual(values=c("#66c2a5", "#fc8d62")) +
labs(x="Measurement",
y="Time",
color="Measurement") +
theme_bw()geom_jitter to geom_point to see
why geom_jitter is a better visualisation of the data. Go
back to using geom_jitter and play with the
width argument to customise your plot.p + geom_boxplot(aes(x=type, y=year, color=type)) +
geom_point(aes(x=type, y=year, color=type)) +
theme_bw()theme for the world map? Add it to
the plot.world <- ne_countries(scale = "medium",
returnclass = "sf")
g <- ggplot()
g + geom_sf(data = world) +
geom_point(data=coord,
aes(x=lng, y=lat, color=cluster)) +
scale_color_brewer(type="qual", palette = "Set3") +
theme_void()Fundamentals of Data Visualization at (https://serialmentor.com/dataviz/ Wilke (2019) (with free online version!)
overview of the most appropriate graph for your data type at From data to viz https://www.data-to-viz.com/